The score is calculated as the p-value of a chi-squared test. This is based on co-occurrence of markers and diseases within abstracts or sentences.
For each marker/disease pair, a 2x2 contengency table is built, containing the entries true positives, true negatives, false positives, and false negatives.
● True positives are defined as the number of co-occurences of a specific marker and disease pair.
● False negatives are defined as the amount of matches for the marker that do not contain the specific disease.
● False positives are defined as the amount of matches for the disease that do not contain the specific marker.
● True negatives are defined as the amount of all other matches that do not contain the specific disease and marker.
The chi-squared test checks if a marker/disease pair has more co-occurences than expected just by chance. The lower the score, the higher ist he reliability of the specific marker/disease pair.